43 research outputs found

    Formalization of the Czech morphology system with respect to automatic processing of Czech texts

    Get PDF
    Přesný morfologický popis slovních tvar· je prvním předpokladem pro úspné automatické zpracování jazykových dat. Systém kategorií a jejich hodnot, které se k popisu pouoívají, jsou náplní první ásti práce. Základním principem je tzv. Zlaté pravidlo morfologie, které říká, oe kaodý slovní tvar by ml být v systému popsán jednoznan. Existence variant na úrovni slovních tvar· i celých paradigmat vak splnní tohoto pravidla komplikuje. Koncept variant roziřujeme na tzv. mutace, mezi které řadíme i jiné mnooiny slovních tvar· se stejným popisem (např. víceré tvary osobn ích zájmen). Mutace dlíme na globální pro popis na úrovni paradigmat a ektivní pro popis jednotlivých slovních tvar·. Toto rozdlení nám umooňuje postihnout jejich asté kombinace. Upoutíme od dlení variant (mutací) podle stylového příznaku jako neobjektivního kritéria. Při d·sledném vyuoívání hodnot kategorií Flektivní mutace a Globální mutace z·stane Zlaté pravidlo morfologie vody splnno. V kapitole o lemmatizaci zavádíme vícenásobné lemma pro popis variantn ích lemmat. Podrobn se zabýváme popisem tzv. slooenin, tedy slovních tvar· typu za, proň, koupilas, koliks. Pro jejich lemmatizaci rovno vyuoíváme konceptu ví- cenásobného lemmatu. Podle slovních druh· jejich slooek je dlíme na nkolik typ·. Zabýváme se téo problémem jejich vyhledávání v...Detailed morphological description of word forms represents one of the most important conditions of a successful automatic processing of linguistic data. The system of categories and their values which are used for the description are the subject of the rst part of the thesis. The basic principle, so-called Golden rule of morphology, states that every word form has to be described by the system unambiguously. The existence of variants of word forms and whole paradigms, however, complicates the accomplishment of this rule.We introduce so called mutations as an extension of the variants to be able to include other sets of word forms with the same description (for instance multiple word forms of Czech personal pronouns). We divide mutations into two parts global ones describing all word forms of a paradigm, and in ectional ones for the description on the word form level. This division enables us to express their various combinations. We do not use features of style for the mutation division, for they are subjective. With a consistent use of the categories called In ectional Mutation and Global Mutation, the Golden rule of morphology will always be valid. The concept of multiple lemma is introduced in a chapter dealing with lemmatization. It describes lemma variants. We give a detailed description of so-called...Institute of the Czech National CorpusÚstav českého národního korpusuFilozofická fakultaFaculty of Art

    Formalization of the Czech morphology system with respect to automatic processing of Czech texts

    Get PDF
    Přesný morfologický popis slovních tvar· je prvním předpokladem pro úspné automatické zpracování jazykových dat. Systém kategorií a jejich hodnot, které se k popisu pouoívají, jsou náplní první ásti práce. Základním principem je tzv. Zlaté pravidlo morfologie, které říká, oe kaodý slovní tvar by ml být v systému popsán jednoznan. Existence variant na úrovni slovních tvar· i celých paradigmat vak splnní tohoto pravidla komplikuje. Koncept variant roziřujeme na tzv. mutace, mezi které řadíme i jiné mnooiny slovních tvar· se stejným popisem (např. víceré tvary osobn ích zájmen). Mutace dlíme na globální pro popis na úrovni paradigmat a ektivní pro popis jednotlivých slovních tvar·. Toto rozdlení nám umooňuje postihnout jejich asté kombinace. Upoutíme od dlení variant (mutací) podle stylového příznaku jako neobjektivního kritéria. Při d·sledném vyuoívání hodnot kategorií Flektivní mutace a Globální mutace z·stane Zlaté pravidlo morfologie vody splnno. V kapitole o lemmatizaci zavádíme vícenásobné lemma pro popis variantn ích lemmat. Podrobn se zabýváme popisem tzv. slooenin, tedy slovních tvar· typu za, proň, koupilas, koliks. Pro jejich lemmatizaci rovno vyuoíváme konceptu ví- cenásobného lemmatu. Podle slovních druh· jejich slooek je dlíme na nkolik typ·. Zabýváme se téo problémem jejich vyhledávání v...Detailed morphological description of word forms represents one of the most important conditions of a successful automatic processing of linguistic data. The system of categories and their values which are used for the description are the subject of the rst part of the thesis. The basic principle, so-called Golden rule of morphology, states that every word form has to be described by the system unambiguously. The existence of variants of word forms and whole paradigms, however, complicates the accomplishment of this rule.We introduce so called mutations as an extension of the variants to be able to include other sets of word forms with the same description (for instance multiple word forms of Czech personal pronouns). We divide mutations into two parts global ones describing all word forms of a paradigm, and in ectional ones for the description on the word form level. This division enables us to express their various combinations. We do not use features of style for the mutation division, for they are subjective. With a consistent use of the categories called In ectional Mutation and Global Mutation, the Golden rule of morphology will always be valid. The concept of multiple lemma is introduced in a chapter dealing with lemmatization. It describes lemma variants. We give a detailed description of so-called...Institute of the Czech National CorpusÚstav českého národního korpusuFilozofická fakultaFaculty of Art

    Machine Translation of Medical Texts in the Khresmoi Project

    Get PDF
    The WMT 2014 Medical Translation Task poses an interesting challenge for Machine Translation (MT). In the standard translation task, the end application is the translation itself. In this task, the MT system is considered a part of a larger system for cross-lingual information retrieval (IR)

    Adaptation of machine translation for multilingual information retrieval in the medical domain

    Get PDF
    Objective. We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve eectiveness of cross-lingual IR. Methods and Data. Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech–English, German–English, and French–English. MT quality is evaluated on data sets created within the Khresmoi project and IR eectiveness is tested on the CLEF eHealth 2013 data sets. Results. The search query translation results achieved in our experiments are outstanding – our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech–English, from 23.03 to 40.82 for German–English, and from 32.67 to 40.82 for French–English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French–English. For Czech–English and German–English, the increased MT quality does not lead to better IR results. Conclusions. Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance – better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions

    CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies

    Get PDF
    The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.Peer reviewe

    Khresmoi Professional: Multilingual Semantic Search for Medical Professionals

    Get PDF
    There is increasing interest in and need for innovative solutions to medical search. In this paper we present the EU funded Khresmoi medical search and access system, currently in year 3 of 4 of development across 12 partners . The Khresmoi system uses a component based architecture housed in the cloud to allow for the development of several innovative applications to support target users medical information needs. The Khresmoi search systems based on this architecture have been designed to support the multilingual and multimod al information needs of three target groups the general public, general practitioners and consultant radiologists. In this paper we focus on the presentation of the systems to support the latter two groups using semantic, multilingual text and image based (including 2D and 3D radiology images) search

    Khresmoi: Multimodal Multilingual Medical Information Search

    Get PDF
    Khresmoi is a European Integrated Project developing a multilingual multimodal search and access system for medical and health information and documents. It addresses the challenges of searching through huge amounts of medical data, including general medical information available on the internet, as well as radiology data in hospital archives. It is developing novel semantic search and visual search techniques for the medical domain. At the MIE Village of the Future, Khresmoi proposes to have two interactive demonstrations of the system under development, as well as an overview oral presentation and potentially some poster presentation

    Khresmoi – multilingual semantic search of medical text and images

    Get PDF
    The Khresmoi project is developing a multilingual multimodal search and access system for medical and health information and documents. This scientific demonstration presents the current state of the Khresmoi integrated system, which includes components for text and image annotation, semantic search, search by image similarity and machine translation. The flexibility in adapting the system to varying requirements for different types of medical information search is demonstrated through two instantiations of the system, one aimed at medical professionals in general and the second aimed at radiologists. The key innovations of the Khresmoi system are the integration of multiple software components in a flexible scalable medical search system, the use of annotation cycles including manual correction to improve semantic search, and the possibility to do large scale visual similarity search on 2D and 3D (CT, MR) medical images

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Formalization of the Czech morphology system with respect to automatic processing of Czech texts

    Get PDF
    Detailed morphological description of word forms represents one of the most important conditions of a successful automatic processing of linguistic data. The system of categories and their values which are used for the description are the subject of the rst part of the thesis. The basic principle, so-called Golden rule of morphology, states that every word form has to be described by the system unambiguously. The existence of variants of word forms and whole paradigms, however, complicates the accomplishment of this rule.We introduce so called mutations as an extension of the variants to be able to include other sets of word forms with the same description (for instance multiple word forms of Czech personal pronouns). We divide mutations into two parts global ones describing all word forms of a paradigm, and in ectional ones for the description on the word form level. This division enables us to express their various combinations. We do not use features of style for the mutation division, for they are subjective. With a consistent use of the categories called In ectional Mutation and Global Mutation, the Golden rule of morphology will always be valid. The concept of multiple lemma is introduced in a chapter dealing with lemmatization. It describes lemma variants. We give a detailed description of so-called..
    corecore